{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use BigQuery DataFrames to run Anthropic LLM at scale\n",
    "\n",
    "<table align=\"left\">\n",
    "\n",
    "  <td>\n",
    "    <a href=\"https://colab.research.google.com/github/anniexudan/bqtoclaude/blob/main/Python_Notebook_Sample/BigFrames%2BClaude_Remote_Function.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Colab logo\"> Run in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td>\n",
    "    <a href=\"https://github.com/anniexudan/bqtoclaude/blob/main/Python_Notebook_Sample/BigFrames%2BClaude_Remote_Function.ipynb\">\n",
    "      <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">\n",
    "      View on GitHub\n",
    "    </a>\n",
    "  </td>                                                                                               \n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Overview\n",
    "\n",
    "Anthropic Claude models are available as APIs on Vertex AI ([docs](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude)).\n",
    "\n",
    "To run the Claude models at large scale data we can utilze the BigQuery\n",
    "DataFrames remote functions ([docs](https://cloud.google.com/bigquery/docs/use-bigquery-dataframes#remote-functions)).\n",
    "BigQuery DataFrames provides a simple pythonic interface `remote_function` to\n",
    "deploy the user code as a BigQuery remote function and then invoke it at scale\n",
    "by utilizing the parallel distributed computing architecture of BigQuery and\n",
    "Google Cloud Function.\n",
    "\n",
    "In this notebook we showcase one such example. For the demonstration purpose we\n",
    "use a small amount of data, but the example generalizes for large data. Check out\n",
    "various IO APIs provided by BigQuery DataFrames [here](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.pandas#bigframes_pandas_read_gbq)\n",
    "to see how you could create a DataFrame from your Big Data sitting in a BigQuery\n",
    "table or GCS bucket."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set Up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up a claude model in Vertex\n",
    "\n",
    "https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude#before_you_begin"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install Anthropic with Vertex if needed\n",
    "\n",
    "Uncomment the following cell and run the cell to install anthropic python\n",
    "package with vertex extension if you don't already have it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install anthropic[vertex] --quiet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define project and location for GCP integration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = \"\"\n",
    "LOCATION = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Initialize BigQuery DataFrames dataframe\n",
    "\n",
    "BigQuery DataFrames is a set of open source Python libraries that let you take\n",
    "advantage of BigQuery data processing by using familiar Python APIs.\n",
    "See for more details https://cloud.google.com/bigquery/docs/bigquery-dataframes-introduction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import BigQuery DataFrames pandas module and initialize it with your project\n",
    "# and location\n",
    "\n",
    "import bigframes.pandas as bpd\n",
    "bpd.options.bigquery.project = PROJECT\n",
    "bpd.options.bigquery.location = LOCATION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use a DataFrame with small amount of inline data for demo purpose.\n",
    "You could create a DataFrame from your own data. See APIs like `read_gbq`,\n",
    "`read_csv`, `read_json` etc. at https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "Query job a1e84aa5-199e-4a26-b740-068b98a6539d is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=bigframes-dev&j=bq:europe-west1:a1e84aa5-199e-4a26-b740-068b98a6539d&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Query job 9203bc82-ca4b-4d76-b1ae-60e98e4e27cc is DONE. 0 Bytes processed. <a target=\"_blank\" href=\"https://console.cloud.google.com/bigquery?project=bigframes-dev&j=bq:europe-west1:9203bc82-ca4b-4d76-b1ae-60e98e4e27cc&page=queryresults\">Open Job</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>questions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>What is the capital of France?</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Explain the concept of photosynthesis in simpl...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Write a haiku about artificial intelligence.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 1 columns</p>\n",
       "</div>[3 rows x 1 columns in total]"
      ],
      "text/plain": [
       "                                           questions\n",
       "0                     What is the capital of France?\n",
       "1  Explain the concept of photosynthesis in simpl...\n",
       "2       Write a haiku about artificial intelligence.\n",
       "\n",
       "[3 rows x 1 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = bpd.DataFrame({\"questions\": [\n",
    "   \"What is the capital of France?\",\n",
    "   \"Explain the concept of photosynthesis in simple terms.\",\n",
    "   \"Write a haiku about artificial intelligence.\"\n",
    " ]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use BigQuery DataFrames `remote_function`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a remote function from a custom python function that takes a prompt\n",
    "and returns the output of the claude LLM running in Vertex. We will be using\n",
    "`max_batching_rows=1` to control parallelization. This ensures that a single\n",
    "prompt is processed per batch in the underlying cloud function so that the batch\n",
    "processing does not time out. An ideal value for `max_batching_rows` depends on\n",
    "the complexity of the prompts in the real use case and should be discovered\n",
    "through offline experimentation. Check out the API for other ways to control\n",
    "parallelization https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.pandas#bigframes_pandas_remote_function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "@bpd.remote_function(packages=[\"anthropic[vertex]\"], max_batching_rows=1)\n",
    "def anthropic_transformer(message: str) -> str:\n",
    "  from anthropic import AnthropicVertex\n",
    "  client = AnthropicVertex(region=LOCATION, project_id=PROJECT)\n",
    "\n",
    "  message = client.messages.create(\n",
    "              max_tokens=1024,\n",
    "              messages=[\n",
    "                  {\n",
    "                      \"role\": \"user\",\n",
    "                      \"content\": message,\n",
    "                  }\n",
    "              ],\n",
    "              model=\"claude-3-5-sonnet@20240620\",\n",
    "          )\n",
    "  content_text = message.content[0].text if message.content else \"\"\n",
    "  return content_text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the BigQuery remote function created\n",
    "anthropic_transformer.bigframes_remote_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the cloud function created\n",
    "anthropic_transformer.bigframes_cloud_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply the remote function on the user data\n",
    "df[\"answers\"] = df[\"questions\"].apply(anthropic_transformer)\n",
    "df"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
